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Abstract. The simulations of extensive air showers 
as well as the detectors involved in their detection 
play a fundamental role in the study of the high 
energy cosmic rays. At the highest energies the 
detailed simulation of air showers is very costly in 
processing time and disk space due to the large 
number of secondary particles generated in inter- 
actions with the atmosphere, e.g. ~ 10 11 for 10 20 eV 
proton shower. Therefore, in order to increase the 
statistics, it is quite common to recycle single showers 
many times to simulate the detector response. In this 
work we present a detailed study of the artificial 
effects introduced by the multiple use of single air 
showers for the detector simulations. In particular, 
we study the effects introduced by the repetitions in 
the kernel density estimators which are frequently 
used in composition studies. 
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I. Introduction 

Air shower and detector simulations play a funda- 
mental role in the study of cosmic rays. In particular, 
arrays of surface detectors that do not have fluorescence 
telescopes to calibrate the energy scale, must resort to 
simulated data in order to estimate the energy of the 
primary particle. Furthermore, the primary mass is also 
obtained comparing experimental data with simulations. 

There are several Monte Carlo programs for air 
shower simulation, the most used in the literature are 
AIRES ID, CORSIKA 0, and CONEX 0, the latter 
for a fast simulation of the longitudinal shower devel- 
opment. Since the number of particles produced in a 
shower can be extremely large, e.g., ~ 10 11 for a 10 20 
eV proton shower, the computer processing time and 
disk space needed are also very large, even if unthinning 
methods J4), Q are used. Due to this difficulty it 
is a common practice to reuse the same shower for 
generating several events (see for example Q). This 
practice is more common in simulations that include 
surface detectors because, for fluorescence telescopes, 
very fast Monte Carlo programs like CONEX, have 
very fast and efficient algorithms for the generation of 
longitudinal profiles. 

In this work we present a study of the effects of using 
multiple repetitions of individual showers ||9|, applied 
to the simulation of detectors, on the evaluation of 
standard estimators of the expected value, variance, and 



covariance. We study in detail the effects introduced 
in the kernel density estimators, which are analytical 
estimates of the underlying distribution function ob- 
tained from a finite sample of events. In cosmic rays 
physics this technique is used mainly in connection with 
composition analyses (see for example iflOl . Q, (HI); 
however, it is also extensively used in many different 
areas of knowledge [ 1 1 1 to which this work can be 
directly extended. 

II. Analytical Treatment 

As mentioned in the introduction, we want to study 
the potential distortions introduced by reusing individual 
showers to maximize the statistics when simulating the 
response of a detector. Let us start with the optimum 
case in which each individual shower is used only once 
and, therefore, best reproduces reality. 

Let y be a d-dimensional vector composed by physical 
observables (e.g. mass sensitive parameters) distributed 
as g(y) and let z be a random vector, distributed as h(z), 
that takes into account the effects of the detectors and 
the corresponding reconstruction method such that, after 
measuring and reconstructing the empirical information, 
a vector x = y + z is obtained. The distribution function 
of x is the convolution of g(y) and h(z), 



/(x) =go /i(x) = J dy g(y)h(x - y) 



(1) 



Suppose that we have a sample of N independent 
events of the distribution /, xi = yi + zi,. .. , xjy = 
yjv + zjy. The probability of this configuration can be 
written as, 



P(yi ■ • -yw,zi 



• ZjvJ 



P(Xi . . . Xjv) = 



fl'(yi) • • - .g(yw) x 

ft(zi) . ..h(z N ), 

/(x 1 ).../(x A r). (2) 



However, as previously noted, if single showers are 
recycled and used many times to simulate the response 
of the detectors, non-independent samples are obtained. 
If we use each shower of a sample of M independent 
showers m times to simulate the detectors response, 
the following sample of size N — M x m is ob- 
tained, xn = yi + zn, . . . , Xi m = yi + zi m , . . . , 
xwi = Ym + zmi, ■ ■ ■ , XMm = Ym + ZMm, where 
the notation used henceforth corresponds to £ l aa , where 
i is the ith coordinate of vector £, a indicates the 
number of independent shower and a the number of 
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detector simulation performed using the ath shower. 
The probability of such configuration is given by 



M 



P{yi ■ ■ -yjVjZll • ■ -ZMm) = J\ g(y a ) Y[ h ( Z aa) 

a—1 a—1 

M . 

P(xn...xM ra ) = I [ / dy a g(y a ) x 

a=l •* 

m 

Yl h(x aa -y a ). (3) 
0=1 

A. Mean, variance and covariance estimators 

Let us consider the average of the ith coordinate of 
x, x\ for the realistic case in which each shower is used 
only once to simulate the detector response, 



1 N 
N E : 

a=l 



(4) 



By using Eq. (0 it is easy to obtain the very well known 
expressions for the expected value and variance of x 1 , 



E[x*] = E[x*] 
Var[x l ] = ^- Var[x l 



(5) 
(6) 



The usual estimator of the covariance between two 
random variables is given by, 



1 



N 
a=l 



{x l a -x l )(xi-x 3 ). 



(7) 



For i = j the estimator of the variance of x 1 is obtained, 
sf = Cu. By using Eq. (|2]i it can be shown that both 
estimators are non-biased, 

£?[Cy = cov[x\x j ], (8) 
E[s\] = E[C ii ] = Var[x i ]. (9) 

For the case in which each shower is used several 
times to simulate the response of the detectors the 
average of x l is given by, 



^ M m 

X ' 1 = iWm SS) 1 ^- 

a—1 a—1 

From Eqs. (13110b it can be shown that, 



Var[. 



E[x' z ] = E[x l 
1 



Var\x l 



m — 1 



(10) 

(ID 
dydxidx-2 



Mm Mm 
(x\ - E[a^])(4 - Elx*]) x 
gfr)h(x 1 -y)h(x.2-y), (12) 

which means that using samples obtained by reusing 
individual showers to simulate the detector response 
does not introduce any bias when calculating the av- 
erage. However the fluctuations of x % are increased by 
the generation of an additional term proportional to 
(m - 1)1 Mm. 



The estimator of the covariance, between x % and x 3 , 
including multiple repetitions of the individual showers 
takes the form, 

M m 

^ = M~m~T £ 5>« " ~ (13) 

a—1 a—1 

The expected value of the covariance estimator is 
obtained from Eqs. (O and ( 1131 . 



rn — 1 
Mm 



rfyrfxi dx.2 



(x i 1 -E[x i ])(xi-E[x 3 ])g(y)x 

h( x i - y) h ( x 2 - y)- 



(14) 



Therefore, as expected, the repetition of individual 
showers introduces a bias in the covariance estimator 
because the events are not independent. The bias results 
proportional to (m — 1)/Mm. 

As mentioned before, the expected value of the vari- 
ance estimator is obtained setting i = j in Eq. (TT4l >. 

E[s't] = Varix 1 } - ( dyd^d^ 2 {x\ - 



(15) 



Mm 

E[x l ]){x 2 ~E[x 1 ]) g(y) x 
h(xi -y) h(x 2 -y), 



which shows that also s' i is now a biased estimator of 
the variance of x l . 

B. Density estimators 

The density estimation technique consist in obtaining 
an estimator of the underlying density function from a 
given data sample ifTTl . In one of the most widely used 
variants of that technique, a density estimator is obtained 
from a superposition of kernel functions centered at each 
event of the data sample. For d-dimensional data the 
kernel density estimator can be written as, 



/(x) 



1 

N 



N 

E 



1 



\H\ 



K(H 



-1/2 



(x-x«)), (16) 



where x is a d-dimensional vector, H is a symmetric, 
positively defined matrix (i.e., the symmetric, positively 
defined square-root matrix i? -1 / 2 exists) and K(u) is 
the kernel function. The matrix H gives the covariance 
between the different pairs of variables and also the de- 
gree of smoothing, i.e., the width of the kernel function. 

From Eqs. (f2|i and ( fTol l the expected value of the 
density estimator is obtained, 



£[/(x)] 



1 



\H\ 



J dx' K(H- 1/2 ■ (x - x')) /(x) 



(17) 

which shows that /(x) is a biased estimator of /(x). 

By using the Taylor expansion and retaining the domi- 
nant terms an approximated expression for the integrated 
mean square error IMSE = J dx E[(f(x) — f(x)) 2 } 
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is obtained, 

IMSE 



where 

-.2 



^ Jdx J duif^u^ 1 / 2 : 



D 2 f(x) V V*u 



R(K) 



N h d y/\V\ 



,(18) 



(19) 

Here we take H~ x l 2 = V -1 / 2 /h, where h is a small 
parameter that parametrizes the degree of smoothing. 

Minimizing IM SE with respect to h, the well known 
expression of h opt is recovered, 

1 



h opt oc 



(20) 



7V 1 /('i+4) ' 

where the constant of proportionality depends on /(x), 
the unknown density function that we want to estimate. 

Let us consider the case in which shower repetitions of 
individual showers are included. The density estimator 
in this case is given by, 

M m 




It can be seen from Eqs. © and (fJTJ, that the bias 
does not change when the repetitions are introduced. 
However, as expected, the variance increases, 

1 TO — 1 



Var[f(x)] 



Aim 



R{K) /(x) 



Mm 



J dy 5 (y)ft 2 (x-y)-/ 2 (x)j , 



(22) 

where just the leading terms are retained. Consequently, 
the IMSE takes in this particular case the form 

IMSE' = y J dx J du K(u)u T V 1/2 x 

R(K) 



D 2 f{x)V 1 ' 2 n 
m — 1 



Mm h d y/\V\ 



Mm 
m — 1 
Mm 



J dxdy g(y)h 2 (x-y) - 
R(f)- (23) 



Eq. (l23l shows that the leading term introduced by 
the repetitions does not depend on h and, therefore, 
the expression for h opt remains equal to the to = 1 
case. The only effect introduced by the repetitions of 
the individual showers is to increase the fluctuations of 
the estimator for each x. 

III. Numerical Example 

In this section a numerical example that shows the 
predicted effects introduced by the shower repetitions 
is given. For that purpose, air showers simulations are 



performed using the program CONEX. A total of N a h = 
11000 proton showers of primary energy E = 10 19 eV 
and zenith angle 9 = 30° are generated. 

Samples of the parameter X max obtained from the 
CONEX simulations are considered. A Gaussian uncer- 
tainty of a[X max ) = 20 g cm~ 2 and [i = is assumed in 
order to take into account the detector response and the 
reconstruction method. Therefore, the distribution func- 
tion of the reconstructed X max is given by Eq. ([TJ with 
g(X max ) the distribution function corresponding to the 
physical fluctuations and h(X) = G(X;0,<7[X max \), 
a Gaussian distribution of mean value fi = and 
a = cr[X max ], which takes into account the response 
of the detectors and reconstruction methods. 

Four sets of 100 samples are considered. Each set 
of samples is noted as S(M,m) where M indicates the 
independent values of X max (obtained from CONEX) 
in each sample and m the number of repetitions of 
each shower, i.e., the number of times that the Gaussian 
distribution h(X) = G(X; X % maxl a[X max \) is sampled 
for each of the M independent values X % max in each 
individual sample. Therefore, 5(no,i) 



£(10, 11), #(110,1) 



and 5(22, 5) are considered, where S^nci) and S, 110 ^ 
just differ in the different values obtained from the 
Gaussian distribution performed to include the detector 
response and reconstruction method. The number of 
events in each sample, belonging to the different sets, is 
N ev = M x to = 110, the same for all kind of samples 
considered. 

Figure Q] shows the distributions of the estimators of 
the average, X max , and the standard deviation, s[X max ], 
for the sets of samples considered. It can be seen that, 
as expected, when the repetitions are included, the fluc- 
tuations increase and when the number of independent 
showers increases the fluctuations decrease. Figure [T] 
also shows that, although the distributions of s[X maa; ] 
with repetitions have a tail towards larger values of 
grammage, which is not present in the corresponding 
without repetitions, the bias is not statistically signi- 
ficative. This is consistent with Eq. (fl3T l which shows 
that the expected bias introduced by repetitions in the 
variance is proportional to (m — 1)/Mm = 0.1 for 

£(10, 11)- 

In order to illustrate the effects of repetitions on the 
density estimators, one-dimensional Gaussian kernels 
are used to estimate the density function of X max . An 
adaptive bandwidth method, introduced by B. Silverman 
ifTTl . is used to obtain better estimates of the density 
function. 

For each sample belonging to a given set a density 
estimate is obtained, therefore, 110 density estimates are 
obtained for each set of samples considered. Figure |2] 
shows the mean value and the one sigma region obtained 
from the density estimates of each set. It can be seen 
that the mean values corresponding to samples with or 
without repetitions are very similar, which is consistent 
with the result obtained in subsection III-BI Also, as 
expected from Eq. d22l . the fluctuations corresponding 
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Fig. 1. Distributions of X max and s[X m ax] for the different sets Fig. 2. Mean and one sigma regions for the density estimates obtained 



of samples considered. 



from the different samples considered. Darker regions and dotted lines 
correspond to samples including multiple repetitions. 



to sets including repetition are larger and comparing 
the results obtained for S^icii) and SV22, 5) we see that 
the fluctuations in the latter case are smaller due to the 
smaller number of repetitions. 

IV. Conclusions 

In this work we present a study of the effects of 
recycling individual cosmic ray showers to simulate 
the detector response, which is a common practice in 
Monte Carlo simulations at the highest energies. We 
find that the standard estimators of the expected value, 
variance and covariance are modified. In particular, 
the average remains as a non-biased estimator of the 
expected value but the fluctuations are increased. For the 
standard estimators of the variance and covariance a bias 
proportional to (m — 1)/Mm appears when repetitions 
are included. Besides, as in the case of the average, the 
fluctuations of both estimators are increased. Finally, 
we study the effects introduced by repetition in the 
kernel density estimators obtained from finite samples. 
We find again that the expected value of the estimator is 
unchanged, i.e., the bias takes the same form. However, 



the pointwise fluctuations are increased and become 
more important as the ratio (m — 1)/Mm increases. 
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